Regret-Matching Bounds Bounds for Regret-Matching Algorithms
نویسندگان
چکیده
We study a general class of learning algorithms, which we call regret-matching algorithms, along with a general framework for analyzing their performance in online (sequential) decision problems (ODPs). In each round of an ODP, an agent chooses a probabilistic action and receives a reward. The particular reward function that applies at any given round is not revealed until after the agent acts. Further, the reward function may change arbitrarily from round to round. Our analytical framework is based on a set Φ of transformations over the agent’s set of actions. We calculate a Φ-regret vector by comparing the reward obtained by an agent over some finite sequence of rounds to the reward that could have been obtained had the agent instead played each transformation φ ∈ Φ of its sequence of actions. Regret-matching algorithms select the agent’s next action based on the vector of Φ-regrets together with a link function f . In this paper, we derive bounds on the regret experienced by (Φ, f)-regret matching algorithms for polynomial and exponential link functions and arbitrary Φ. Whereas others have typically derived bounds on distribution regret, our focus is on bounding the expectation of action regret. A simplification of our framework, however, yields bounds on distribution regret equivalent to (and in some cases slightly stronger than) others that have appeared in the literature.
منابع مشابه
Bounds for Regret-Matching Algorithms
We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set Φ of transformations over the set of actions. Specifically, we calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to th...
متن کاملFair Algorithms for Infinite Contextual Bandits
We study fairness in infinite linear bandit problems. Starting from the notion of meritocratic fairness introduced in Joseph et al. [9], we expand their notion of fairness for infinite action spaces and provide an algorithm that obtains a sublinear but instance-dependent regret guarantee. We then show that this instance dependence is a necessary cost of our fairness definition with a matching l...
متن کاملLower Bounds on Regret for Noisy Gaussian Process Bandit Optimization
In this paper, we consider the problem of sequentially optimizing a black-box function f based on noisy samples and bandit feedback. We assume that f is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple ...
متن کاملNo-regret algorithms for structured prediction problems—DRAFT
No-regret algorithms are a popular class of learning rules which map a sequence of input vectors x1, x2 . . . to a sequence of predictions y1, y2, . . .. Unfortunately, most no-regret algorithms assume that the predictions yt are chosen from a small, discrete set. We consider instead prediction problems where yt has internal structure: yt might be a strategy in a game like poker, or a configura...
متن کاملRegret bounds for Non Convex Quadratic Losses Online Learning over Reproducing Kernel Hilbert Spaces
We present several online algorithms with dimension-free regret bounds for general nonconvex quadratic losses by viewing them as functions in Reproducing Hilbert Kernel Spaces. In our work we adapt the Online Gradient Descent, Follow the Regularized Leader and the Conditional Gradient method meta algorithms for RKHS spaces and provide regret bounds in this setting. By analyzing them as algorith...
متن کامل